output distribution
Natural-Parameter Networks: A Class of Probabilistic Neural Networks
Neural networks (NN) have achieved state-of-the-art performance in various applications. Unfortunately in applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming of NN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models. To address these problems, we propose a class of probabilistic neural networks, dubbed natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment of NN.
- Oceania > Australia > New South Wales > Sydney (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (9 more...)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness
Ensemble approaches for uncertainty estimation have recently been applied to the tasks of misclassification detection, out-of-distribution input detection and adversarial attack detection. Prior Networks have been proposed as an approach to efficiently emulate an ensemble of models for classification by parameteris-ing a Dirichlet prior distribution over output distributions.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- North America > Canada (0.04)
- (2 more...)
- Information Technology > Security & Privacy (0.92)
- Government > Military (0.60)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- North America > Canada (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- North America > Dominican Republic (0.04)
- (12 more...)
- Information Technology (0.68)
- Education (0.46)
- Health & Medicine (0.46)
Inevitable Trade-off between Watermark Strength and Speculative Sampling Efficiency for Language Models
Large language models are probabilistic models, and the process of generating content is essentially sampling from the output distribution of the language model. Existing watermarking techniques inject watermarks into the generated content without altering the output quality. On the other hand, existing acceleration techniques, specifically speculative sampling, leverage a draft model to speed up the sampling process while preserving the output distribution. However, there is no known method to simultaneously accelerate the sampling process and inject watermarks into the generated content. In this paper, we investigate this direction and find that the integration of watermarking and acceleration is non-trivial. We prove a no-go theorem, which states that it is impossible to simultaneously maintain the highest watermark strength and the highest sampling efficiency. Furthermore, we propose two methods that maintain either the sampling efficiency or the watermark strength, but not both. Our work provides a rigorous theoretical foundation for understanding the inherent trade-off between watermark strength and sampling efficiency in accelerating the generation of watermarked tokens for large language models. We also conduct numerical experiments to validate our theoretical findings and demonstrate the effectiveness of the proposed methods.
Localization with Sampling-Argmax
Soft-argmax operation is commonly adopted in detection-based methods to localize the target position in a differentiable manner. However, training the neural network with soft-argmax makes the shape of the probability map unconstrained. Consequently, the model lacks pixel-wise supervision through the map during training, leading to performance degradation. In this work, we propose sampling-argmax, a differentiable training method that imposes implicit constraints to the shape of the probability map by minimizing the expectation of the localization error. To approximate the expectation, we introduce a continuous formulation of the output distribution and develop a differentiable sampling process. The expectation can be approximated by calculating the average error of all samples drawn from the output distribution. We show that sampling-argmax can seamlessly replace the conventional soft-argmax operation on various localization tasks. Comprehensive experiments demonstrate the effectiveness and flexibility of the proposed method.